home *** CD-ROM | disk | FTP | other *** search
- Path: dd.chalmers.se!news.chalmers.se!sunic!EU.net!howland.reston.ans.net!cs.utexas.edu!swrinde!sgiblab!news.cs.indiana.edu!news.Arizona.EDU!math.arizona.edu!CS.Arizona.EDU!not-for-mail
- From: dave@CS.Arizona.EDU (Dave Schaumann)
- Newsgroups: comp.programming
- Subject: Re: Hash function for strings
- Date: 4 Mar 1994 17:04:23 -0700
- Organization: University of Arizona CS Department, Tucson AZ
- Lines: 64
- Message-ID: <2l8ia7$97d@caslon.CS.Arizona.EDU>
- References: <amundsj.1.0@novell.oih.no> <2l6kis$8jp@scax18.pki-nbg.philips.de>
- NNTP-Posting-Host: caslon.cs.arizona.edu
-
- In article <2l6kis$8jp@scax18.pki-nbg.philips.de>,
- Frank Munkert <ln_fmu@sle20.pki-nbg.philips.de> wrote:
- |
- |In article <amundsj.1.0@novell.oih.no>, amundsj@novell.oih.no (AMUNDSEN JARLE/3AA/D) writes:
- |> I am trying to find a good hash function for strings, names that is, and
- |> wonder if anyone has an idea on how good such a function can hash. Given
- |> that the keys are not known in advance.
- |
- |The performance of several different hash function is discussed
- |in "Aho, Sethi, Ullman: Compilers - Principles, Techniques and Tools;
- |Addison-Wesley, Reading, 1986."
-
- The hash function they use that performs the best is this (well,
- modified just a bit):
-
- int hash_pjw( char *s, unsigned int size ) {
- char *p ; unsigned int h = 0, g ;
-
- for( p = s ; *p != '\0' ; p++ ) {
- h = (h << 4) + *p
- if( (g = h & 0xf0000000) ) { /* note: assumes 32-bit ints */
- h = h ^ (g >> 24) ; /* here, too */
- h = h ^ g ;
- }
- }
-
- return h % size ;
- }
-
- I've used this with very good results for prime-sized hash tables and
- internal chaining. For instance, here's the stats for loading /usr/dict/words
- into my hash table using hash_pjw:
-
- number of rehashes : 9
- population : 24483
- table size : 55609
- load factor : 0.440270
- total nodes searched : 73433
- total search calls : 52216
- Avg nodes searched per call: 1.406331
- probes keys %
- ------ ----- ------
- 1 18982 77.53
- 2 3498 91.82
- 3 1218 96.79
- 4 449 98.63
- 5 182 99.37
- 6 83 99.71
- 7 36 99.86
- 8 15 99.92
- 9 7 99.95
- 10 6 99.97
- 11 5 99.99
- 12 1 100.00
- 15 1 100.00
-
- Note that total search calls > population because of rehashing.
- The hash table starts out at a size of 101, which is probably too
- small, particularly if your planning on loading /usr/dict/words into
- the hash table very often :-). I can also get a little better behavior
- by tweaking the probe function.
-
- --
- Dave Schaumann dave@cs.arizona.edu
-